Cell Magic Tutorial

Interactions with MLDB occurs via a REST API. Interacting with a REST API over HTTP from a Notebook interface can be a little bit laborious if you're using a general-purpose Python library like requests directly, so MLDB comes with a Python library called pymldb to ease the pain.

pymldb does this in three ways:

  • the %mldb magics: these are Jupyter line- and cell-magic commands which allow you to make raw HTTP calls to MLDB, and also provides some higher-level functions. This tutorial shows you how to use them.
  • the Python Resource class: this is simple class which wraps the requests library so as to make HTTP calls to the MLDB API more friendly in a Notebook environment. Check out the Resource Wrapper Tutorial for more info on the Resource class.
  • the Python BatFrame class: this is a class that behaves like the Pandas DataFrame but offloads computation to the server via HTTP calls. Check out the BatFrame Tutorial for more info on the BatFrame.

The %mldb Magic System

Basic Magic

We'll start by initializing the %mldb magic system


In [1]:
%reload_ext pymldb


mldb magic initialized with host as http://localhost

And then we'll ask it for some help


In [2]:
%mldb help


Usage:

  Line magic functions:

    %mldb help          
                        Print this message
    
    %mldb init <url>    
                        Initialize the plugins for the cell magics.
                        Extension comes pre-initialized with <uri> 
                        set to "http://localhost"
    
    %mldb doc <kind>/<type>    
                        Shows documentation in an iframe. <kind> can
                        be one of "datasets", "functions", "procedures" or
                        "plugins" and <type> can be one of the installed
                        types, e.g. procedures/classifier. NB this will 
                        only work with an MLDB-hosted Notebook for now.

    %mldb query <sql>
                        Run an SQL-like query and return a pandas 
                        DataFrame. Dataset selection is done via the 
                        FROM clause.

    %mldb loadcsv <dataset> <url>
                        Create a dataset with id <dataset> from a CSV
                        hosted at the HTTP url <url>.
                        
    %mldb py <uri> <json args>
                        Run a python script named "main.py" from <uri>
                        and pass in <json args> as arguments.
                        <uri> can be one of:
                          - file://<rest of the uri>: a local directory
                          - gist://<rest of the uri>: a gist
                          - git://<rest of the uri>: a public git repo
                          - http(s)://<rest of the uri>: a file on the web

    %mldb pyplugin <name> <uri>
                        Load a python plugin called <name> from <uri> 
                        by executing its main.py. Any pre-existing plugin
                        called <name> will be deleted first.
                        <uri> can be one of:
                          - file://<rest of the uri>: a local directory
                          - gist://<rest of the uri>: a gist
                          - git://<rest of the uri>: a public git repo
                          - http(s)://<rest of the uri>: a file on the web
                          
    %mldb GET <route>
    %mldb DELETE <route>
                        HTTP GET/DELETE request to <route>. <route> should
                        start with a '/'.
                        
    %mldb GET <route> <json query params>		
                        HTTP GET request to <route>, JSON will be used to 		
                        create query string. <route> should start with a '/'.		
                        
    %mldb PUT <route> <json>
    %mldb POST <route> <json>
                        HTTP PUT/POST request to <route>, <json> will
                        be sent as JSON payload. <route> should start
                        with a '/'.
       
                        
  Cell magic functions:

    %%mldb py <json args>
    <python code>
                        Run a python script in MLDB from the cell body.
    
    %%mldb query
    <sql>
                        Run an SQL-like query from the cell body and return
                        a pandas DataFrame. Dataset selection is done via
                        the FROM clause.
    
    %mldb loadcsv <dataset>
    <csv>
                        Create a dataset with id <dataset> from a CSV
                        in the cell body.
                        
    %%mldb GET <route>
    <json query params>
                        HTTP GET request to <route>, cell body will be
                        parsed as JSON and used to create query string.
                        <route> should start with a '/'.
                        
    %%mldb PUT <route>
    <json>
    %%mldb POST <route>
    <json>
                        HTTP PUT/POST request to <route>, cell body will
                        be sent as JSON payload. <route> should start
                        with a '/'.

The most basic way in which the %mldb magic can help us with MLDB's REST API is by allowing us to type natural-feeling REST commands, like this one, which will list all of the available dataset types:


In [3]:
%mldb GET /v1/types/datasets


Out[3]:
GET http://localhost/v1/types/datasets
200 OK
[
  "beh", 
  "beh.binary", 
  "beh.live", 
  "beh.mutable", 
  "beh.ranged", 
  "embedding", 
  "merged", 
  "sqliteSparse", 
  "transposed"
]

You can use similar syntax to run PUT, POST and DELETE queries as well.

Advanced Magic

The %mldb magic system also includes syntax to do more advanced operations like loading and querying data. Let's load the dataset from the Predicting Titanic Survival demo with a single command (after deleting it first if it's already loaded):


In [4]:
%mldb DELETE /v1/datasets/titanic
%mldb loadcsv titanic https://raw.githubusercontent.com/datacratic/mldb-pytanic-plugin/master/titanic_train.csv


Success!

And now let's run an SQL query on it:


In [5]:
%mldb query select * from titanic limit 5


Out[5]:
Age Cabin Embarked Fare Name Parch PassengerId Pclass Sex SibSp Ticket label
_rowName
0 22 S 7.25 BraundMr.OwenHarris 0 1 3 male 1 A/521171 0
97 23 D10D12 C 63.3583 GreenfieldMr.WilliamBertram 1 98 1 male 0 PC17759 1
273 37 C118 C 29.7 NatschMr.CharlesH 1 274 1 male 0 PC17596 0
524 C 7.2292 KassemMr.Fared 0 525 3 male 0 2700 0
278 7 Q 29.125 RiceMaster.Eric 1 279 3 male 4 382652 0

We can get the results out as a Pandas DataFrame just as easily:


In [6]:
df = %mldb query select * from titanic
type(df)


Out[6]:
pandas.core.frame.DataFrame

Server-Side Python Magic

Python code which is executed in a normal Notebook cell runs within the Notebook Python interpreter. MLDB supports the sending of Python scripts via HTTP for execution within its own in-process Python interpreter. Server-side python code gets access to a high-performance version of the REST API which bypasses HTTP, via an mldb.perform() function.

There's an %mldb magic command for running server-side Python code, from the comfort of your Notebook:


In [7]:
%%mldb py

# this code will run on the server!
print mldb.perform("GET", "/v1/types/datasets", [], {})["response"]


["beh","beh.binary","beh.live","beh.mutable","beh.ranged","embedding","merged","sqliteSparse","transposed"]

Putting it all together

Now that you've seen the basics, check out the Mapping Reddit demo to see how to use the %mldb magic system to do machine learning with MLDB.


In [ ]: